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Abstract — We develop a theory of the algorithmic informa- 
tion in bits contained in an individual pure quantum state. 
This extends classical Kolmogorov complexity to the quan- 
tum domain retaining classical descriptions. Quantum Kol- 
mogorov complexity coincides with the classical Kolmogorov 
complexity on the classical domain. Quantum Kolmogorov 
complexity is upper bounded and can be effectively approx- 
imated from above under certain conditions. With high 
probability a quantum object is incompressible. Upper- and 
lower bounds of the quantum complexity of multiple copies 
of individual pure quantum states are derived and may shed 
some light on the no-cloning properties of quantum states. 
In the quantum situation complexity is not sub-additive. We 
discuss some relations with "no-cloning" and "approximate 
cloning" properties. 

Keywords — Algorithmic information theory, quantum; 
classical descriptions of quantum states; information the- 
ory, quantum; Kolmogorov complexity, quantum; quantum 
cloning. 



I. Introduction 

QUANTUM information theory, the quantum mechan- 
ical analogue of classical information theory |(| , is ex- 
periencing a renaissance due to the rising interest in the 
notion of quantum computation and the possibility of re- 
alizing a quantum computer jl6| . While Kolmogorov com- 
plexity (jl2| is the accepted absolute measure of information 
content in a individual classical finite object, a similar ab- 
solute notion is needed for the information content of an 
individual pure quantum state. One motivation is to extend 
probabilistic quantum information theory to Kolmogorov's 
absolute individual notion. Another reason is to try and 
duplicate the success of classical Kolmogorov complexity as 
a general proof method in applications ranging from com- 
binatorics to the analysis of algorithms, and from pattern 
recognition to learning theory |^3| . We propose a theory 
of quantum Kolmogorov complexity based on classical de- 
scriptions and derive the results given in the abstract. A 
preliminary partial version appeared as Jl9| | . 

What are the problems and choices to be made develop- 
ing a theory of quantum Kolmogorov complexity? Quan- 
tum theory assumes that every complex vector of unit 
length represents a realizable pure quantum state ||l7|| . 
There arises the question of how to design the equipment 
that prepares such a pure state. While there are contin- 
uously many pure states in a finite-dimensional complex 
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vector space — corresponding to all vectors of unit length — 
we can finitely describe only a countable subset. Imposing 
effectiveness on such descriptions leads to constructive pro- 
cedures. The most general such procedures satisfying uni- 
versally agreed-upon logical principles of effectiveness are 
quantum Turing machines, Q. To define quantum Kol- 
mogorov complexity by way of quantum Turing machines 
leaves essentially two options: 

1. We want to describe every quantum superposition ex- 
actly; or 

2. we want to take into account the number of bits/qubits 
in the specification as well the accuracy of the quantum 
state produced. 

We have to deal with three problems: 

• There are continuously many quantum Turing machines; 

• There are continuously many pure quantum states; 

• There are continuously many qubit descriptions. 
There are uncountably many quantum Turing machines 
only if we allow arbitrary real rotations in the definition of 
machines. Then, a quantum Turing machine can only be 
universal in the sense that it can approximate the compu- 
tation of an arbitrary machine, In descriptions using 
universal quantum Turing machines we would have to ac- 
count for the closeness of approximation, the number of 
steps required to get this precision, and the like. In con- 
trast, if we fix the rotation of all contemplated machines to 
a single primitive rotation 9 with cos 9 ~ | and sin 9 — | , 
then there are only countably many Turing machines and 
the universal machine simulates the others exactly Q| . Ev- 
ery quantum Turing machine computation, using arbitrary 
real rotations to obtain a target pure quantum state, can 
be approximated to every precision by machines with fixed 
rotation 9 but in general cannot be simulated exactly — 
just like in the case of the simulation of arbitrary quantum 
Turing machines by a universal quantum Turing machine. 
Since exact simulation is impossible by a fixed universal 
quantum Turing machine anyhow, but arbitrarily close ap- 
proximations are possible by Turing machines using a fixed 
rotation like 9, we are motivated to fix Qi, Q2, ■ ■ ■ as a stan- 
dard enumeration of quantum Turing machines using only 
rotation 9. 

Our next question is whether we want programs (descrip- 
tions) to be in classical bits or in qubits? The intuitive no- 
tion of computability requires the programs to be classical. 
Namely, to prepare a quantum state requires a physical ap- 
paratus that "computes" this quantum state from classical 
specifications. Since such specifications have effective de- 
scriptions, every quantum state that can be prepared can 
be described effectively in descriptions consisting of classi- 
cal bits. Descriptions consisting of arbitrary pure quantum 
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states allows noncomputable (or hard to compute) informa- 
tion to be hidden in the bits of the amplitudes. In Defini- 
tion |] we call a pure quantum state directly computable if 
there is a (classical) program such that the universal quan- 
tum Turing machine computes that state from the program 
and then halts in an appropriate fashion. In a computa- 
tional setting we naturally require that directly computable 
pure quantum states can be prepared. By repeating the 
preparation we can obtain arbitrarily many copies of the 
pure quantum state. 

If descriptions are not effective then we are not going to 
use them in our algorithms except possibly on inputs from 
an "unprepared" origin. Every quantum state used in a 
quantum computation arises from some classically prepa- 
ration or is possibly captured from some unknown origin. 
If the latter, then we can consume it as conditional side- 
information or an oracle. 

Restricting ourselves to an effective enumeration of quan- 
tum Turing machines and classical descriptions to describe 
by approximation continuously many pure quantum states 
is reminiscent of the construction of continuously many real 
numbers from Cauchy sequences of rational numbers, the 
rationals being effectively enumerable. 

Kolmogorov complexity: We summarize some basic 
definitions in Appendix |A| (see also this journal p0|) in 
order to establish notations and recall the notion of short- 
est effective descriptions. More details can be found in the 
textbook jl3) . Shortest effective descriptions are "effective" 
in the sense that they are programs: we can compute the 
described objects from them. Unfortunately, [^2[ , there 
is no algorithm that computes the shortest program and 
then halts, that is, there is no general method to compute 
the length of a shortest description (the Kolmogorov com- 
plexity) from the object being described. This obviously 
impedes actual use. Instead, one needs to consider com- 
putable approximations to shortest descriptions, for exam- 
ple by restricting the allowable approximation time. Apart 
from computability and approximability, there is another 
property of descriptions that is important to us. A set of 
descriptions is prefix-free if no description is a proper prefix 
of another description. Such a set is called a prefix code. 
Since a code message consists of concatenated code words, 
we have to parse it into its constituent code words to re- 
trieve the encoded source message. If the code is uniquely 
decodable, then every code message can be decoded in only 
one way. The importance of prefix-codes stems from the 
fact that (i) they are uniquely decodable from left to right 
without backing up, and (ii) for every uniquely decodable 
code there is a prefix code with the same length code words. 
Therefore, we can restrict ourselves to prefix codes. In our 
setting we require the set of programs to be prefix-free and 
hence to be a prefix-code for the objects being described. It 
is well-known that with every prefix-code there corresponds 
a probability distribution P(-) such that the prefix-code 
is a Shannon-Fano code [] that assigns prefix code length 
l x — — logP(x) to x — irrespective of the regularities in x. 

1 In what follows, "log" denotes the binary logarithm. 



For example, with the uniform distribution P(x) = 2~ n 
on the set of n-bit source words, the Shannon-Fano code 
word length of an all-zero source word equals the code word 
length of a truly irregular source word. The Shannon-Fano 
code gives an expected code word length close to the en- 
tropy, and, by Shannon's Noiseless Coding Theorem, it 
possesses the optimal expected code word length. But the 
Shannon-Fano code is not optimal for individual elements: 
it does not take advantage of the regularity in some ele- 
ments to encode those shorter. In contrast, one can view 
the Kolmogorov complexity K{x) as the code word length 
of the shortest program x* for x, the set of shortest pro- 
grams consitituting the Shannon-Fano code of the so-called 
"universal distribution" m(a;) = 2~ K ( X \ The code consist- 
ing of the shortest programs has the remarkable property 
that it achieves (i) an expected code length that is about 
optimal since it is close to the entropy, and simultaneously, 
(ii) every individual object is coded as short as is effectively 
possible, that is, squeezing out all regularity. In this sense 
the set of shortest programs constitutes the optimal effec- 
tive Shannon-Fano code, induced by the optimal effective 
distribution (the universal distribution). 

Quantum Computing: We summarize some basic def- 
initions in Appendix |b] in order to establish notations and 
briefly review the notion of a quantum Turing machine 
computation. See also this journal's survey Q on quan- 
tum information theory. More details can be found in the 
textbook |l6| ]. Loosely speaking, like randomized compu- 
tation is a generalization of deterministic computation, so 
is quantum computation a generalization of randomized 
computation. Realizing a mathematical random source to 
drive a random computation is, in its ideal form, presum- 
ably impossible (or impossible to certify) in practice. Thus, 
in applications an algorithmic random number generator is 
used. Strictly speaking this invalidates the analysis based 
on mathematical randomized computation. As John von 
Neumann |l5f| put it: "Any one who considers arithmetical 
methods of producing random digits is, of course, in a state 
of sin. For, as has been pointed out several times, there is 
no such thing as a random number — there are only meth- 
ods to produce random numbers, and a strict arithmetical 
procedure is of course not such a method." In practice ran- 
domized computations reasonably satisfy theoretical anal- 
ysis. In the quantum computation setting, the practical 
problem is that the ideal coherent superposition cannot re- 
ally be maintained during computation but deteriorates — it 
decoheres. In our analysis we abstract from that problem 
and one hopes that in practice anti-decoherence techniques 
will suffice to approximate the idealized performance suffi- 
ciently 

We view a quantum Turing machine as a generalization 
of the classic probabilistic (that is, randomized) Turing 
machine. The probabilistic Turing machine computation 
follows multiple computation paths in parallel, each path 
with a certain associated probability. The quantum Turing 
machine computation follows multiple computation paths 
in parallel, but now every path has an associated complex 
probability amplitude. If it is possible to reach the same 
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state via different paths, then in the probabilistic case the 
probability of observing that state is simply the sum of the 
path probabilities. In the quantum case it is the squared 
norm of the summed path probability amplitudes. Since 
the probability amplitudes can be of opposite sign, the ob- 
servation probability can vanish; if the path probability 
amplitudes are of equal sign then the observation probabil- 
ity can get boosted since it is the square of the sum norm. 
While this generalizes the probabilistic aspect, and boosts 
the computation power through the phenomenon of inter- 
ference between parallel computation paths, there are extra 
restrictions vis-a-vis probabilistic computation in that the 
quantum evolution must be unitary. 

Quantum Kolmogorov Complexity: We define the 
Kolmogorov complexity of a pure quantum state as the 
length of the shortest two-part code consisting of a classical 
program to compute an approximate pure quantum state 
and the negative log-fidelity of the approximation to the 
target quantum state. We show that the resulting quantum 
Kolmogorov complexity coincides with the classical self- 
delimiting complexity on the domain of classical objects; 
and that certain properties that we love and cherish in the 
classical Kolmogorov complexity are shared by the new 
quantum Kolmogorov complexity: quantum Kolmogorov 
complexity of an n-qubit object is upper bounded by about 
2n; it is not computable but can under certain conditions 
be approximated from above by a computable process; and 
with high probability a quantum object is incompressible. 
We may call this quantum Kolmogorov complexity the bit 
complexity of a pure quantum state \<p) (using Dirac's "ket" 
notation) and denote it by K(\cj>}). From now on, we will 

denote by < an inequality to within an additive constant, 

and by = the situation when both < and > hold. For exam- 
ple, we will show that, for n-qubit states \(f>), the complexity 

satisfies K(\<p) | n) < In. For certain restricted pure quan- 
tum states, quantum kolmogorov complexity satisfies the 

sub-additive property: K(\<j>,ip)) < K(\4>)) + K(\ip) \ \<f>}). 
But, in general, quantum Kolmogorov complexity is not 
sub-additive. Although "cloning" of non-orthogonal states 
is forbidden in the quantum setting ]2l]j , 0, m copies of 
the same quantum state have combined complexity that 
can be considerable lower than m times the complexity 
of a single copy. In fact, quantum Kolmogorov complex- 
ity appears to enable us to express and partially quantify 
"non-clonability" and "approximate clonability" of individ- 
ual pure quantum states. 

Related Work: In the classical situation there are sev- 
eral variants of Kolmogorov complexity that are very mean- 
ingful in their respective settings: plain Kolmogorov com- 
plexity, prefix complexity, monotone complexity, uniform 
complexity, negative logarithm of universal measure, and so 
on |fl3ll . It is therefore not surprising that in the more com- 
plicated situation of quantum information several different 
choices of complexity can be meaningful and unavoidable 
in different settings. Following the preliminary version [ fl9| 
of this work there have been alternative proposals: 

Qubit Descriptions: The most straightforward way to 



define a notion of quantum Kolmogorov complexity is to 
consider the shortest effective qubit description of a pure 
quantum state which is studied in Q. (This qubit com- 
plexity can also be formulated in terms of the conditional 
version of bit complexity as in ]l9[| .) An advantage of qubit 
complexity is that the upper bound on the complexity of a 
pure quantum state is immediately given by the number of 
qubits involved in the literal description of that pure quan- 
tum state. Let us denote the resulting qubit complexity of 
a pure quantum state \<p) by KQ(\<f>)). 

While it is clear that (just as with the previous aproach) 
the qubit complexity is not computable, it is unlikely that 
one can approximate the qubit complexity from above by a 
computable process in some meaningful sense. In particu- 
lar, the dovetailing approach we used in our approach now 
doesn't seem applicable due to the non-countability of the 
potentential qubit program candidates. The quantitative 
incompressibility properties are much like the classical case 
(this is important for future applications). There are some 
interesting exceptions in case of objects consisting of multi- 
ple copies related to the "no-cloning" property of quantum 
objects, |2^], 0. Qubit complexity does not satisfy the 
sub-additive property, and a certain version of it (bounded 
fidelity) is bounded above by the von Neumann entropy. 

Density Matrices: In classical algorithmic informa- 
tion theory it turns out that the negative logarithm of the 
"largest" probability distribution effectively approximable 
from below — the universal distribution — coincides with the 
self-delimiting Kolmogorov complexity. In Gacs defines 
two notions of complexities based on the negative loga- 
rithm of the "largest" density matrix /i effectively approx- 
imable from below. There arise two different complexi- 
ties of \4>) based on whether we take the logarithm inside 
as KG(\<p)) = —(0 | log// | 4>) or outside as Kg(\<f>)) = 

-\og{<j>\li\ 4>). It turns out that Kg{\<j>)) < KG(\<f>)). 
This approach serves to compare the two approaches above: 
It was shown that Kg(\<f>)) is within a factor four of K(\(j))); 
that KG(\4>)) essentially is a lower bound on KQ(\(p)) and 
an oracle version of KG is essentially an upper bound on 
qubit complexity KQ. Since qubit complexity is trivially 

< n and it was shown that bit complexity is typically close 
to 2n, at first glance this leaves the possibility that the two 
complexities are within a factor two of each other. This 
turns out to be not the case since it was shown that the Kg 
complexity can for some arguments be much smaller than 
the KG complexity, so that the bit complexity is in these 
cases also much smaller than the qubit complexity. As 
states: this is due to the permissive way the bit complexity 
deals with approximation. The von Neumann entropy of a 
computable density matrix is within an additive constant 
(the complexity of the program computing the density ma- 
trix) of a notion of average complexity. The drawback of 
density matrix based complexity is that we seem to have 
lost the direct relation with a meaningful interpretation in 
terms of description length: a crucial aspect of classical 
Kolmogorov complexity in most applications |fl3fl . 

Real Descriptions: A version of quantum Kolmogorov 
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complexity briefly considered in [|19| uses computable real 
parameters to describe the pure quantum state with com- 
plex probability amplitudes. This requires two reals per 
complex probability amplitude, that is, for n qubits one 
requires 2 n+1 real numbers in the worst case. A real num- 
ber is computable if there is a fixed program that outputs 
consecutive bits of the binary expansion of the number for- 
ever. Since every computable real number may require a 
separate program, a computable n-qubit pure state may re- 
quire 2™ +1 finite programs. Most n-qubit pure states have 
parameters that are noncomputable and increased preci- 
sion will require increasingly long programs. For exam- 
ple, if the parameters are recursively enumerable (the po- 
sitions of the "l"s in the binary expansion is a recursively 
enumerable set), then a logfc length program per parame- 
ter, to achieve k bits precision per recursively enumerable 
real, is sufficient and for some recursively enumerable re- 
als also necessary. In certain contexts where the approx- 
imation of the real parameters is a central concern, such 
considerations may be useful. While this approach does 
not allow the development of a clean theory in the sense 
of the previous approaches, it can be directly developed 
in terms of algorithmic thermodynamics — an extension of 
Kolmogorov complexity to randomness of infinite sequences 
(such as binary expansions of real numbers) in terms of 
coarse- graining and sequential Martin-L6f tests, analogous 
to the classical case in 0, Q. But this is outside the 
scope of the present paper. 

II. Quantum Turing Machine Model 

We assume the notation and definitions in Appendices [A| 
|b|. Our model of computation is a quantum Turing ma- 
chine equipped with a input tape that is one-way infinite 
with the classical input (the program) in binary left ad- 
justed from the beginning. We require that the input tape 
is read-only from left-to-right without backing up. This 
automatically yields a property we require in the sequel: 
The set of halting programs is prcfix-frcc. Additionaly, the 
machine contains a one-way infinite work tape containing 
qubits, a one-way infinite auxiliary tape containing qubits, 
and a one-way infinite output tape containing qubits. Ini- 
tially, the input tape contains a classical binary program p, 
and all (qu)bits of the work tape, auxiliary tape, and out- 
put tape qubits are set to |0). In case the Turing machine 
has an auxiliary input (classical or quantum) then initially 
the leftmost qubits of the auxiliary tape contain this in- 
put. A quantum Turing machine Q with classical program 
p and auxiliary input y computes until it halts with output 
Q(p,y) on its output tape or it computes forever. Halt- 
ing is a more complicated matter here than in the classical 
case since quantum Turing machines are reversible, which 
means that there must be an ongoing evolution with non- 
repeating configurations. There are various ways to resolve 
this problem || and we do not discuss this matter further. 
We only consider quantum Turing machine that do not 
modify the output tape after halting. Another — related — 
problem is that after halting the quantum state on the out- 
put tape may be "entangled" with the quantum state of the 



remainder of the machine, that is, the input tape, the finite 
control, the work tape, and the auxilliary tape. This has 
the effect that the output state viewed in isolation may not 
be a pure quantum state but a mixture of pure quantum 
states. This problem does not arise if the output and the 
remainder of the machine form a tensor product so that the 
output is un-entangled with the remainder. The results in 
this paper are invariant under these different assumptions, 
but considering output entangled with the remainder of 
the machine complicates formulas and calculations. Corre- 
spondingly, we restrict consideration to outputs that form 
a tensor product with the remainder of the machine, with 
the understanding that the same results hold with about 
the same proofs if we choose the other option — except in 
the case of Theorem [| item (ii), see the pertinent caveat 
there. Note that the Kolmogorov complexity based on en- 
tangled output tapes is at most (and conceivably less than) 
the Kolmogorov complexity based on un-entangled output 
tapes. 

Definition 1: Define the output Q{p,y) of a quantum 
Turing machine Q with classical program p and auxil- 
iary input y as the pure quantum state \ip) resulting of 
Q computing until it halts with output \ip) on its ouput 
tape. Moreover, \ip) doesn't change after halting, and 
it is un-entangled with the remainder of Q's configura- 
tion. We write Q(p,y) < oo. If there is no such \ip) 
then Q(p,y) is undefined and we write Q(p,y) — oo. By 
definition the input tape is read-only from left-to-right 
without backing up: therefore the set of halting programs 
= {? '■ Q(PiV) < °°} is prefix-free: no program in V y 
is a proper prefix of another program in V v . Put differ- 
ently, the Turing machine scans all of a halting program 
p but never scans the bit following the last bit of p: it is 
self-delimiting. 

We fix the rotation of all contemplated machines to a sin- 
gle primitive rotation 9 with cos 9 = | and sin 6 = | . There 
are only countably many such Turing machines. Using a 
standard ordering, we fix Qi, Q2, ... as a standard enumer- 
ation of quantum Turing machines using only rotation 9. 
By Q| , there is a universal machine U in this enumeration 
that simulates the others exactly: U(T l 0p,y) = Qi(p,y), 
for all i,p,y. (Instead of the many-bit encoding 1*0 for 
i we can use a shorter self-delimiting code like i' in Ap- 
pendix ^[) As noted in the Introduction, every quantum 
Turing machine computation using arbitrary real rotations 
can be approximated to arbitrary precision by machines 
with fixed rotation 9 but in general cannot be simulated 
exactly. 

Remark 1: There are two possible interpretations for the 
computation relation Q(p, y) = \x) . In the narrow interpre- 
tation we require that Q with p on the input tape and y on 
the conditional tape halts with \x) on the output tape. In 
the wide interpretation we can define pure quantum states 
by requiring that for every precision parameter k > the 
computation of Q with p on the input tape and y on the 
conditional tape, with k on a special new tape where the 
precision is to be supplied, halts with \x') on the output 
tape and | a;')|| 2 > 1 — l/2 fe . Such a notion of "com- 
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putable" or "recursive" pure quantum states is similar to 
Turing's notion of "computable numbers." In the remain- 
der of this section we use the narrow interpretation. 

Remark 2: As remarked in Q, the notion of a quan- 
tum computer is not essential to the theory here or in 
[Q, ||. Since the computation time of the machine is 
not limited in the theory of description complexity as de- 
veloped here, a quantum computer can be simulated by 
a classical computer to every desired degree of precision. 
We can rephrase everything in terms of the standard enu- 
meration of T\,T2, ■ ■ ■ of classical Turing machines. Let 
\x) = Y^i=^ a i\ e i) (N = 2") be an n-qubit state. We can 
write T(p) = |x) if T either outputs 

(i) algebraic definitions of the coefficients of \x) (in case 
these are algebraic), or 

(ii) a sequence of approximations (ao,fc, ■ ■ ■ , ctN-i,k) for 
k = 1, 2, . . . where on^ is an algebraic approximation of cti 
to within 2~ k . 

HI. Classical Descriptions of Pure Quantum 

States 

The complex quantity (x \ z) is the inner product of vec- 
tors (x\ and \z). Since pure quantum states |x),|z) have 
unit length, 1 1 (x \ z)\ \ — | cos 9\ where 9 is the angle between 
vectors \x) and \z). The quantity ||(x | z)|| 2 , the fidelity 
between \x) and \z), is a measure of how "close" or "con- 
fusable" the vectors \x) and \z) are. It is the probability of 
outcome \x) being measured from state \z). Essentially, we 
project \z) on outcome \x) using projection |x)(x| resulting 
in (x | z)\x). 

Definition 2: The (self-delimiting) complexity of \x) with 
respect to quantum Turing machine Q with y as conditional 
input given for free is 

K Q (\x) | j,)=min{Z(p) + r-log||(z|x)|| 2 l : Q(p,y) = \z)} 
p 

(1) 

where l(p) is the number of bits in the program p, auxiliary 
y is an input (possibly quantum) state, and \x) is the target 
state that one is trying to describe. 

Note that \z) is the quantum state produced by the com- 
putation Q{p,y), and therefore, given Q and y 7 completely 
determined by p. Therefore, we obtain the minimum of 
the right-hand side of the equality by minimizing over p 
only. We call the \z) that minimizes the right-hand side 
the directly computed part of \x) while |~— log||(z | x)\\ ] is 
the approximation part. 

Quantum Kolmogorov complexity is the sum of two 
terms: the first term is the integral length of a binary pro- 
gram, and the second term, the minlog probability term, 
corresponds to the length of the corresponding code word 
in the Shannon-Fano code associated with that probabil- 
ity distribution, see for example and is thus also ex- 
pressed in an integral number of bits. Let us consider 
this relation more closely: For a quantum system \z) the 
quantity P{x) — \\{z |x)|| 2 is the probability that the 
system passes a test for \x), and vice versa. The term 
f— log||(z | x)\\ ] can be viewed as the code word length 



to redescribe |x), given \z) and an orthonormal basis with 
\x) as one of the basis vectors, using the Shannon-Fano pre- 
fix code. This works as follows: Write N = 2 n . For every 
state \z) in (2")-dimensional Hilbert space with basis vec- 
tors B = {\e ), |eAr_i)} we have J^iLo* ll( e » I z )\\ 2 = 1 - 
If the basis has |x) as one of the basis vectors, then we can 
consider \z) as a random variable that assumes value \x) 
with probability ||(x | z)\\ 2 . The Shannon-Fano code word 
for |x) in the probabilistic ensemble \ B, (||(ej | z)|| 2 )^ is 

based on the probability 1 1 (x | z) \ \ 2 of | x) , given | z) , and has 
length \— log 1 1 (x | ]. Considering a canonical method 
of constructing an orthonormal basis B — |eo), . . . , |ejv-i) 
from a given basis vector, we can choose B such that 
K{B) = miiii{K (|ej))} . The Shannon-Fano code is ap- 
propriate for our purpose since it is optimal in that it 
achieves the least expected code word length — the expec- 
tation taken over the probability of the source words — up 
to 1 bit by Shannon's Noiseless Coding Theorem. As in 
the classical case the quantum Kolmogorov complexity is 
an integral number. 

The main property required to be able to develop a 
meaningful theory is that our definition satisfies a so-called 
Invariance Theorem (see also Appendix ^). Below we use 
"I/" to denote a special type of universal (quantum) Turing 
machine rather than a unitary matrix. 

Theorem 1 (Invariance) There is a universal machine U , 
such that for all machines Q, there is a constant cq (the 
length of the description of the index of Q in the enumera- 
tion), such that for all quantum states |x) and all auxiliary 
inputs y we have: 

Ku{\x) | y) < K Q (\x) I y) + c Q . 
Proof: Assume that the program p that minimizes 
the right-hand side of (|l|) is po and the computed \z) is 

l*b>: 

K Q {\x) \y)=l(p Q )+\-\og\\{z \x)\\ 2 -\. 

There is a universal quantum Turing machine U in the 
standard enumeration Qx,Qi, ■ • ■ such that for every quan- 
tum Turing machine Q in the enumeration there is a self- 
delimiting program iq (the index of Q) and U(iQp,y) = 
Q(P,y) for allp,y: if Q(p,y) = \z) then U(i Q p,y) = \z). 
In particular, this holds for po such that Q with auxiliary 
input y halts with output \zq). But U with auxiliary input 
y halts on input iQPo also with output \zq). Consequently, 
the program q that minimizes the right-hand side of (yj) 
with U substituted for Q, and computes U(q,y) = \u) for 
some state \u) possibly different from \z) 1 satisfies 

K v {\x)\y) = %) + r-log||Hx)|| 2 l 

< l{i QPa ) + \-\og\\{z Q \x)\\\ 

Combining the two displayed inequalities, and setting cq = 
1{iq), proves the theorem. ■ 
The key point is not that the universal Turing machine 
viewed as description method does necessarily give the 
shortest description in each case, but that no other effec- 
tive description method can improve on it infinitely often 
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by more than a fixed constant. For every pair U, U of uni- 
versal Turing machines as in the proof of Theorem ^ there 
is a fixed constant Cu,u' > depending only on U and V, such 
that for all \x),y we have: 

\K u (\x)\y)-K u ,(\x)\y)\<c u>w . 

To see this, substitute U' for Q in (m), and, conversely, sub- 
stitute U' for U and U for Q in (Q), and combine the two 
resulting inequalities. While the complexities according to 
U and U' are not exactly equal, they are equal up to a 
fixed constant for all \x) and y. Therefore, one or the other 
fixed choice of reference universal machine U yields result- 
ing complexities that are in a fixed constant enveloppe from 
each other for all arguments. 

Programmers are generally aware that programs for sym- 
bolic manipulation tend to be shorter when they are ex- 
pressed in the LISP programming language than if they are 
expressed in FORTRAN, while for numerical calculations 
the opposite is the case. Or is it? The Invariance Theorem 
in fact shows that to express an algorithm succinctly in a 
program, it does not matter which programming language 
we use — up to a fixed additive constant (representing the 
length of compiling programs from either language into the 
other language) that depends only on the two programming 
languages compared. For further discussion of effective op- 
timality and invariance see p3[ . 

Definition 3: We fix once and for all a reference univer- 
sal quantum Turing machine U and define the quantum 
Kolmogorov complexity as 

K(\x) | y) = Ku(\x) | y), 
K{\x)) = Ku(\x}\e), 

where e denotes the absence of conditional information. 

The definition is continuous: If two quantum states are 
very close then their quantum Kolmogorov complexities are 
very close. Furthermore, since we can approximate every 
(pure quantum) state |a;) to arbitrary closeness, ||, in par- 
ticular, for every constant e > we can compute a (pure 
quantum) state \z) such that \\{z \ x)\\ 2 > 1 — e. One can 
view this as the probability of obtaining the possibly non- 
computable outcome \x) when executing projection \x)(x\ 
on \z) and measuring outcome \x). For this definition to 
be useful it should satisfy: 

• The complexity of a pure state that can be directly com- 
puted should be the length of the shortest program that 
computes that state. (If the complexity is less then this 
may lead to discontinuities when we restrict quantum Kol- 
mogorov complexity to the domain of classical objects.) 

• The quantum Kolmogorov complexity of a classical ob- 
ject should equal the classical Kolmogorov complexity of 
that object (up to a constant additive term). 

• The quantum Kolmogorov complexity of a quantum ob- 
ject should have an upper bound. (This is necessary for 
the complexity to be approximable from above, even if the 
quantum object is available in as many copies as we re- 
quire.) 



• Most objects should be "incompressible" in terms of 
quantum Kolmogorov complexity. 

• In the classical case the average self-delimiting Kol- 
mogorov complexity of the discrete set of all n-bit strings 
under some distribution equals the Shannon entropy up 
to an additive constant depending on the complexity of 
the distribution concerned. In our setting we would like 
to know the relation between the expected n-qubit quan- 
tum Kolmogorov complexity, the expectation taken over a 
computable (semi-) measure over the continuously many n- 
qubit states, with von Neumann entropy. Perhaps the con- 
tinuous set can be restricted to a representative discrete 
set. We have no results along these lines. One problem 
may be that in the quantum situation there can be many 
different mixtures of pure quantum states that give rise to 
the same density matrix, and thus have the same von Neu- 
mann entropy. It is possible that the average Kolmogorov 
complexities of different mixtures with the same density 
matrix (or density matrices with the same eigenvalues) are 
also different (and therefore not all of them can be equal 
to the single fixed von Neumann entropy which depends 
only on the eigenvalues). In contrast, in the approach of 

using semicomputable semi-density matrices, as dis- 
cussed in the Introduction, equality of "average min-log 
universal density" to the von Neumann entropy (up to the 
Kolmogorov complexity of the semicomputable density it- 
self) follows simply and similarly to the classical case. But 
in this approach the interpretation of "min-log universal 
density" in terms of length of descriptions of one form or 
the other is quite problematic (in contrast with the classi- 
cal case) and we thus lose the main motivation of quantum 
Kolmogorov complexity. 

A. Consistency with Classical Complexity 

Our proposal would not be useful if it were the case that 
for a directly computable object the complexity is less than 
the shortest program to compute that object. This would 
imply that the code corresponding to the probabilistic com- 
ponent in the description is possibly shorter than the differ- 
ence in program lengths for programs for an approximation 
of the object and the object itself. This would penalize 
definite description compared to probabilistic description 
and in case of classical objects would make quantum Kol- 
mogorov complexity less than classical Kolmogorov com- 
plexity. 

Theorem 2 (Consistency) Let U be the reference univer- 
sal quantum Turing machine and let \x) be a basis vec- 
tor in a directly computable orthonormal basis B, given 
y: there is a program p such that U(p,y) = \x). Then 

K(\x) | y) = rmn p {l(p) : U(p,y) = \x)} up to ± K(B | y). 
Proof: Let \z) be such that 

K(\x) | y)=mm{l(q) + \-log\\{z \ x)\\^ : U(q,y) = \z)}. 
q 

Denote the program q that minimizes the righthand side 
by 9min and the program p that minimizes the expression 
in the statement of the theorem by p m i n . 



VITANYI: QUANTUM KOLMOGOROV COMPLEXITY BASED ON CLASSICAL DESCRIPTIONS 



7 



A dovetailed computation is a method related to Can- 
tor's celebrated diagonalization method: run all programs 
alternatingly in such a way that every program eventually 
makes progress. On an list of programs Pi,P2, ■ ■ ■ one di- 
vides the overall computation into stages k = 1, 2, In 

stage k of the overall computation one executes the ith 
computation step of every program Pk-i+i for i = 1, . . . , k. 

By running U on all binary strings (candidate programs) 
simultaneously dovetailed-fashion, one can enumerate all 
objects that are directly computable, given y, in order of 
their halting programs. Assume that U is also given a 
K [B | y) length program b to compute B — that is, enumer- 
ate the basis vectors in B. This way q min computes \z), 
the program b computes B. Now since the vectors of B are 
mutually orthogonal 

E IIMe)H 2 = l. 
\e)ets 

Since |x) is one of the basis vectors we have — log \\(z | x)\\ 
is the length of a prefix code (the Shannon-Fano code) to 
compute \x) from \z) and B. Denoting this code by r we 
have that the concatenation q m inbr is a program to compute 
\x): parse it into g m i n ,6, and r using the self-delimiting 
property of g m i n and b. Use q mul to compute \z) and use 
b to compute B, determine the probabilities \\{z | e)|| for 
all basis vectors \e) in B. Determine the Shannon-Fano 
code words for all the basis vectors from these probabilities. 
Since r is the code word for \x) we can now decode \x). 
Therefore, 

%min) + [- log ||<z | x)|| 2 l > l( Pmin ) - K(B | y), 

which was what we had to prove. ■ 
Corollary 1: On classical objects (that is, the natural 
numbers or finite binary strings that are all directly com- 
putable) the quantum Kolmogorov complexity coincides up 
to a fixed additional constant with the self-delimiting Kol- 
mogorov complexity since K(B \ n) = for the standard 
classical basis B — {0, 1}". (We assume that the informa- 
tion about the dimensionality of the Hilbert space is given 
conditionally.) 

Remark 3: Fixed additional constants are no problem 
since the complexity also varies by fixed additional con- 
stants due to the choice of reference universal Turing ma- 
chine. 

Remark 4-' The original plain complexity defined by Kol- 
mogorov, jiq] ) is based on Turing machines where the in- 
put is delimited by distinguished markers. A similar proof 
used to compare quantum Kolmogorov complexity with the 
plain (not self-delimiting) Kolmogorov complexity on clas- 
sical objects shows that they coincide, but only up to a 
logarithmic additive term. 

B. Upper Bound on Complexity 

A priori, in the worst case K{\x) \ n) is possibly oo. We 
show that the worst-case has a 2n upper bound. 



Theorem 3 (Upper Bound) For all n-qubit quantum 

states | a;) we have K{\x) \ n) < 2n. 

Proof: Write N — 2™. For every state \x) 
in (2")-dimensional Hilbert space with basis vectors 
\e ), . . . , |ejv-i) we have Si=o ll( e i I ^ll = 1- Hence 
there is an i such that ||(ej | x)\\ 2 > 1/N. Let p be a 
= K(i | n)-bit program to construct a basis state | e^) given 

n. Then l(p) < n. Then K(\x) | n) < l(p)-\og(l/N) < 2n. 

■ 

Remark 5: This upper bound is sharp since Gacs Q has 

recently shown that there are states \x) with K(\x) \ n) > 
2n — 2 log 7i. 

C. Computability 

In the classical case Kolmogorov complexity is not com- 
putable but can be approximated from above by a com- 
putable process. The non-cloning property prevents us 
from perfectly copying an unknown pure quantum state 
given to us pi] , 0- Therefore, an approximation from 
above that requires checking every output state against the 
target state destroys the latter. It is possible to prepare ap- 
proximate copies from the target state, but the more copies 
one prepares the less they approximate the target state |l(| , 
and this deterioration appears on the surface to prevent use 
in our application below. To sidestep the fragility of the 
pure quantum target state, we simply require that it is an 
outcome, in as many copies as we require, in a measure- 
ment that we have available. Another caveat with respect 
to item (ii) in the theorem below is that, since the approxi- 
mation algorithm in the proof doesn't discriminate between 
entangled output states and un-entangled output states, we 
approximate the quantum Kolmogorov complexity by a di- 
rectly computed part that is possibly a mixture rather than 
a pure state. Thus, the approximated value may be that of 
quantum Kolmogorov complexity based on computations 
halting with entangled output states, which is conceivably 
less than that of un-entangled outputs. This is the only 
result in this paper that depends on that distinction. 

Theorem 4 (Computability) Let \x) be the pure quan- 
tum state we want to describe. 

(i) The quantum Kolmogorov complexity K(\x}) is not 
computable. 

(ii) If we can repeatedly execute the projection \x}(x\ and 
perform a measurement with outcome \x), then the quan- 
tum Kolmogorov complexity K(jx}) can be approximated 
from above by a computable process with arbitrarily small 
probability of error a of giving a too small value. 

Proof: The uncomputability follows a fortiori from 
the classical case. The semicomputability follows because 
we have established an upper bound on the quantum Kol- 
mogorov complexity, and we can simply enumerate all halt- 
ing classical programs up to that length by running their 
computations dovetailed fashion. The idea is as follows: 

Let the target state be \x) of n qubits. Then, K(\x) \ n) < 
In. (The unconditional case K{\x)) is similar with 2n re- 
placed by 2(n + \ogn).) We want to identify a program 
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x* such that p — x* minimizes l(p) — log||(x | U(p, n)}\\ 
among all candidate programs. To identify it in the limit, 
for some fixed k satisfying (|J) below for given n, a, e, re- 
peat the computation of every halting program p with 

l{p) < 2n at least k times and perform the assumed 
projection and measurement. For every halting program 
p in the dovetailing process we estimate the probability 
q = 1 1 (a; | U(p,n))\\ from the fraction m/k: the fraction 
of m positive outcomes out of k measurements. The prob- 
ability that the estimate m/k is off from the real value 
q by more than an eq is given by Chernoff's bound: for 
< e < 1, 



P(|m - qk\ > eqk) < 2e~ e2qk/3 . 



(2) 



This means that the probability that the deviation \m/k — 
q\ exceeds eq vanishes exponentially with growing k. Every 
candidate program p satisfies (|2j) with its own q or 1 — 
q. There are 0(2 2n ) candidate programs p and hence also 
0(2 2n ) outcomes U(p,n) with halting computations. We 
use this estimate to upper bound the probability of error 
a. For given k, the probability that some halting candidate 
program p satisfies \m — qk\ > eqk is at most a with 

U (p,n) <oo 

The probability that no halting program does so is at least 
1 — a. That is, with probability at least 1 — a we have 

m 

(l-e)q<j< (l + e)q 

for every halting program p. It is convenient to restrict 
attention to the case that all q's are large. Without loss of 
generality, if q < i then consider 1 — q instead of q. Then, 



log a < 2n — (e 2 fcloge)/6. 



(3) 



The approximation algorithm is as follows: 

Step 0: Set the required degree of approximation e < 
1 /2 and the number of trials k to achieve the required prob- 
ability of error a. 

Step 1: Dovetail the running of all candidate programs 
until the next halting program is enumerated. Repeat the 
computation of the new halting program k times 

Step 2: If there is more than one program p that 
achieves the current minimum, then choose the program 
with the least length (and hence the least number of suc- 
cessfull observations) . If p is the selected program with m 
successes out of k trials then set the current approximation 
of K(\x}) to 

l{p) ^° g JTTe)k- 

This exceeds the proper value of the approximation based 
on the real q instead of m/k by at most 1 bit for all e < 1. 
Step 3: Goto Step 1. ■ 



D. Incompressibility 

Definition 4-' A pure quantum state \x) is computable 
if K(\x}) < oo. Hence all finite-dimensional pure quan- 
tum states are computable. We call a pure quantum 
state directly computable if there is a program p such that 
U( P ) = \x). 

We have shown that quantum Kolmogorov complexity 
coincides with classical Kolmogorov complexity on classi- 
cal objects in Theorem |^. In the proof we demonstrated 
in fact that the quantum Kolmogorov complexity is the 
length of the classical program that directly computes the 
classical objects. By the standard counting argument, Sec- 
tion [A[ the standard orthonormal basis — consisting of all 
n-bit strings — of the (2 n )-dimensional Hilbert space Hn 
(N = 2") has at least 2"(1 - 2" c ) basis vectors \e t ) that 
satisfy K(\ei) \ n) > n — c. But what about nonclassi- 
cal orthonormal bases? They may not satisfy the stan- 
dard counting argument. Since there are continuously 
many pure quantum states and the range of quantum Kol- 
mogorov complexity has only countably many values, there 
are integer values that are the Kolmogorov complexities of 
continuously many pure quantum states. 

In particular, since the quantum Kolmogorov complex- 
ity of an n-qubit state is < 2n, the set of directly com- 
putable pure n-qubit states has cardinality A < 2 2,l +°( 1 ). 
They divide the set of unit vectors in TIn, the surface of 
the iV-dimcnsional ball with unit radius in Hilbert space, 
into A-many N — 1 dimensional connected surfaces, called 
patches, each consisting of one directly computable pure n- 
qubit state \x) together with those pure n-qubit states \y) 
of which \x) is the directly computed part (Definition |2|). 
In every patch all \y) with the same \ \(x \ y)\ \ have both the 
same complexity and the same directly computed part, and 
for every fixed patch and every fixed value of approximation 
part occurring in the patch, there are continuously many 
\y) with identical directly computed parts and approxima- 
tion parts. A priori it is possible that this is the case for two 
distinct basis vectors in a nonclassical orthonormal bases, 
which implies that the standard counting argument can- 
not be used to show the incompressibility of basis vecors of 
nonclassical orthonormal bases. 

Lemma 1: There is a particular (possibly nonclassical) 
orthonormal basis of the (2")-dimensional Hilbert space 
Hn, computed from the directly computed pure quantum 
states, such that at least 2"(1 — 2 _c ) basis vectors |ej) sat- 
isfy K(\ei) | n) > n — c. 

Proof: Every orthonormal basis oi Hn has 2™ basis 
vectors and there are at most m < J27=o 1 2' = 2™~ c — 1 
programs of length less than n — c. Hence there are at 
most m programs of length < n — c available to approxi- 
mate the basis vectors. We construct an orthonormal basis 
satisfying the lemma: The set of directly computed pure 
quantum states \xq), . . . , \x m -i) span an m'-dimensional 
subspace A with m' < m in the (2 n )-dimensional Hilbert 
space H N such that H N = A®A X . Here A 1 - is a (2 n -m')- 
dimcnsional subspace of TL^ such that every vector in it is 
perpendicular to every vector in A. We can write every 
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element |x) 6 Hn as 

m' — 1 2 71 — m' — 1 

53 On\Oi) + 53 &l 6 *> 
t=0 i=0 

where the |<2j)'s form an orthonormal basis of „4 and the 
\bi)'s form an orthonormal basis of A 1 - so that the |<Zj)'s 
and |6j)'s form an orthonormal basis i"T for TLn- For every 
state \xj) <G -4, directly computed by a program a;*, given 
n, and basis vector e ^4 J - we have ||(xj | &i)|| 2 = 0. 

Therefore, iffl&i) | n) > Z(xJ)-log j]<acj | 6*)|| 2 = oo > n-c 
(0 < j ' < m, < z < 2™ — m'). This proves the lemma. ■ 
Theorem 5 (Incompressibility) The uniform probability 
Pr{|x) : l(\x)) = u, #(|x) | n) > n- c} > 1 - 1/2 C . 

Proof: The theorem follows immediately from a gen- 
eralization of Lemma ^ to arbitrary orthonormal bases: 

Claim 1: Every orthonormal basis |eo), ■ • ■ , |e2»-i) of 
the (2 n )-dimensional Hilbert space TCn has at least 2 n (l — 
2~ c ) basis vectors |e< ) that satisfy K(\ei) \ n) > n — c. 

Proof: Use the notation of the proof of Lemma |l|. Let 
A be a set initially containing the programs of length less 
than n — c, and let B be a set initially containing the set 
of basis vectors |e») with K(\ei) | n) < n — c. Assume to 
the contrary that \B\ > 2 n ~ c . Then at least two of them, 
say |eo) and |ei) and some pure quantum state |x) directly 
computed from a < (n — c)-length program satisfy 

K(\e i )\n) = K(\x)\n)+\-log\\{e i \x)\\% (4) 

with | at) being the directly computed part of both |e;), i = 
0, 1. This means that K (|x) \ n) < n — c — 1 since not both 
|eo) and \e\) can be equal to |x). Hence for every directly 
computed pure quantum state of complexity n — c—l there 
is at most one basis state, say |e), of the same complexity 
(in fact only if that basis state is identical with the directly 
computed state.) Now eliminate every directly computed 
pure quantum state |x) of complexity n — c—l from the 
set A, and the basis state |e) as above (if it exists) from 
B. We are now left with \B\ > 2 n ~ c — 1 basis states of 
which the directly computed parts are included in A with 
\A\ < 2 n ~ c ~ 1 — 1 with every element in A of complexity 
< n — c — 2. Repeating the same argument we end up with 
|A| > 1 basis vectors of which the directly computed parts 
are elements of the empty set B, which is impossible. ■ 

■ 

Example 1: It may be instructive to check the behavior 
of the approximation part — log||(x | z)\\ 2 in Definition 
on a nontrivial example. Let x be a random classical string 
with K(x) > l(x) and let y be a string obtained from x 
by complementing one bit, say in position j. It is known 
(Exercise 2.2.8 in [jl3| due to I. Csiszar) that for every such 

x of length n there is such a y with complexity K (y \ n) 

a — logn. Since K(x \ n) < K(y \ n) + K(j | n) we 

have K(j | n) > logn (and, since j < n we also have 

K(j | n) < logn). Now let \z) be a pure quantum state 
which has classical bits except the difference qubit between 



x and y that has equal probabilities of being observed as 
"1" and as "0." We can prepare \z) by giving y and the 
position of the difference qubit (in logn bits) and therefore 

K[\z) | n) < n. 

From \z) we have probability | of obtaining x by observ- 
ing the difference qubit, it follows K(x | n) < K(\z) | n,j), 

and, since K(\z) \ n) > K(\z) \ n,j), we have K{\z) \ n) > 
n. 

From \z) we also have probability i of obtaining y by 
observing the difference qubit which yields that K(y \ n) < 
K(\z) | n,j). Since also K(\z) | n) > K(\z) \ n,j) > 
K{\z) | n) — K(j | n) = K{\z) \ n) = n — logn, we find 

n — logn < K(y \ n) < n. This is the strongest conclusion 
we can draw about y from the fact that it is the result of 
observing one qubit of a high-complexity \z) constructed 
as above. Viz., if we flip an ith bit of x with complexity 
K(i | n) = logn, this will not necessarily result in a string 
of complexity = n — logn (take for example i = j/2 with j 
as above). 

Remark 6: Theorem || states an upper bound of 2n on 
K(\x) | n). This leaves a relatively large gap with the 
lower bound of n established here. But, as stated earlier, 
Gacs H has shown that there are states |x) with K(\x) \ 

n) > 2n— 2 logn; in fact, most states satisfy this. The proof 
appears to support about the same incompressiblity results 
as in this section, with n replaced by 2n— 2 logn. The proof 
goes by analyzing coverings of the (2™)-dimcnsional ball of 
unit radius, as in ||. 

E. Multiple Copies 

For classical complexity we have K (x, x) = K(x), since a 
classical program to compute x can be used twice; indeed, 
it can be used many times. In the quantum world things 
are not so easy: the no-cloning property mentioned earlier, 
see PH , or the textbooks 0, prevent cloning an 
unknown pure state |x) perfectly to obtain |x)|x): that is, 

K{\x)) < i^(|x)|x)) < 2K(\x)). There is a considerable lit- 
erature on the possibility of approximate cloning to obtain 
m imperfect copies from an unknown pure state, see for 
example JToj ] . Generally speaking, the more qubits arc in- 
volved in the original copy and the more clones one wants to 
obtain, the more the fidelity of the obtained clones deterio- 
rates with respect to the original copy. This stands to rea- 
son since high fidelity cloning would enable both superlumi- 
nal signal transmission JlT| and extracting essentially un- 
bounded information concerning the probability amplitude 
from the original qubits. The approximate cloning possibil- 
ity suggests that in our setting the approximation penalty 
induced by the second — fidelity — term of Definition || may 
be lenient insofar that the complexity of multiple copies in- 
creases sublinearly with the number of copies. Even apart 
from this, the m-fold tensor product |x)® m of |x) with itself 
lives in a small-dimensional symmetric subspace with the 
result that JsT(|x)® m ) can be considerably below mK(\x)). 
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This effect was first noticed in the context of qubit com- 
plexity H| , and it similarly holds for the Kg and KG com- 
plexities in g. Define K+(\xf m ) = ma^{K(\xf m ) : 
\x) is a pure n-qubit quantum state} and write N = 2™. 
The following theorem states that the m-fold copy of every 
n-qubit pure quantum state has complexity at most about 
4 log ( m+ ,^ _1 ) , and there is a pure quantum state for which 
the complexity of the m-fold copy achieves log ( m+ ^ -1 )- 
Theorem 6 (Multiples) Assume the above terminology. 



log 



m -\- N — 1 



< K+(\xf m ) 



< 



K(m) + log 



to + N - 1 



-2 log 



K(m) + log 



to + N - 1 



Proof: Recall the Kg and KG complexities of pure 
quantum states H mentioned in the Introduction. Denote 



by Kg^ 



and KG+( 



) the maximal values of 



Kg(\x) m ) and KG(\x) m ) over all n qubit states \x), re- 
spectively. All of the following was shown in B (the no- 
tation as above and \y) an arbitrary state, for example 

\ x f m y. 



KG + (\x)® m )<K{m)+log( 

Kg+(\x)® m )>\og 
Kg{\y)) < KG(\y)) 



m + N - 1 
m 

m + N-1 
m 



Kg(\y)) <K(\y)) < AKg(\y)) + 2 log Kg (\y)) 



Combining these inequalities gives the theorem. ■ 
The theorem gives a measure of how "clonable" indi- 
vidual n-qubit pure quantum states are — rather than indi- 
cate the average success of a fixed cloning algorithm for all 
n-qubit pure quantum states, as in the approximate and 
probabilistic cloning algorithms referred to above. In par- 
ticular it gives an upper bound on the non-clonability of 
every individual pure quantum state, and moreover it tells 
us that there exist individual pure quantum states that 
are quite non-clonable. One can view this as an applica- 
tion of quantum Kolmogorov complexity. The difference 
K + (\x)® m ) — K + (\x)) expresses the amount of extra in- 
formation required for to copies of \x) over that of one 
copy — in our particular meaning of 



F . Conditional Complexity and Cloning 

In Definition^ the conditional complexity K(\x) \ y) is 
the minimum sum of the length of a classical program to 
compute \z) plus the negative logarithm of the probability 
of outcome \x) when executing projection \x)(x\ on \z) and 
measuring, given y as input on an auxiliary input tape. In 
case y is a classical object, a finite binary string, there is no 
problem with this definition. The situation is more com- 
plicated if instead of a classical 'y' we consider the pure 



quantum state |y) as input on an auxiliary "quantum" in- 
put tape. In the quantum situation the notion of inputs 
consisting of pure quantum states is subject to very special 
rules. 

Firstly, if we are given an unknown pure quantum state 
\y) as input it can be used only once, that is, it is irrevo- 
cably consumed and lost in the computation. It cannot be 
perfectly copied or cloned without destroying the original 
as discussed above. This means that there is a profound 
difference between representing a directly computable pure 
quantum state on the auxiliary tape as a classical program 
or giving it literally. Given as a classical program we can 
prepare and use arbitrarily many copies of it. Given as an 
(unknown) pure quantum state in superposition it can be 
used as perfect input to a computation only once. Thus, 
the manner in which the conditional information is pro- 
vided may make a great difference. A classical program 
for computing a directly computable quantum state carries 
more information than the directly computable quantum 
state itself — much like a shortest program for a classical ob- 
ject carries more information than the object itself. In the 
latter case it consists in partial information about the halt- 
ing problem. In the quantum case of a directly computable 
pure state we have the additional information that the state 
is directly computable and in case of a shortest classical 
program additional information about the halting problem. 
Thus, for classical objects x we have K(x m | x) ± K{m) 
in contrast to: 

Theorem 7 (Cloning) For every pure quantum state \x) 
and every to, we have: 



K(\xf rn | \x)) <K(\xf m - 1 ). 



(5) 



Moreover, for every n there exists an n-qubit pure quantum 
state \x), such that for every to, we have: 



K(\xf m | \x)) > - A K{\xT m -'). (6) 

Proof: (||) is obvious. (JsJ) follows from Theorem ^. ■ 
This holds even if \x) is directly computable but is given 
in the conditional in the form of an unknown pure quantum 
state. The lemma quantifies the "no-cloning" property of 
an individual pure quantum satte \x): Given |a;) and the 
task to obtain to copies of |a;) , we require at least ^th of the 
information to optain to— 1 copies of |x) — everything in the 
sense of quantum Kolmogorov complexity (|l|). However, if 
|ir) is directly computable and the conditional is a classical 
program to compute this directly computable state, then 
that program can be used over and over again, just like in 
the case of classical objects: 

Lemma 2: For every directly computable pure quantum 
state |a;) computed by a classical program p, and every to, 



\x)) > \ K (\xf m -\ 



K{\xf m \p,m) ±0. 



(7) 



G. Sub-additivity 

Let N — 2 n and M = 2 m . Recall the following notation: 
If | a;) is a pure quantum state in (2™)-dimensional Hilbert 
space of l(\x}) = n qubits, and \y) is a pure quantum state 



VITANYI: QUANTUM KOLMOGOROV COMPLEXITY BASED ON CLASSICAL DESCRIPTIONS 



11 



in (2 m )-dimcnsional Hilbcrt space of l(\y)) = m qubits, 
then \x) ® \y) — \x)\y) = \x,y) is a pure quantum state in 
the A^M-dimensional Hilbert space consisting of the tensor 
product of the two initial spaces consisting of l(\x,y)) = 
n + m qubits. 

In the classical Kolmogorov complexity case we have 

K{x) < K(x, y) < K (x\y) + K{y) for every pair of individ- 
ual finite binary strings x and y (the analog of the similar 
familiar relation that holds among entropies — a stochas- 
tic notion — in Shannon's information theory). The second 
inequality is the sub-additivity property of classical Kol- 
mogorov complexity. Obviously, in the quantum setting 

also K(\x,y)) > K(\x)) for every pair of individual pure 
quantum states \x), \y). Below we shall show that the sub- 
additive property does not hold for quantum Kolmogorov 
complexity. But in the restricted case of directly com- 
putable pure quantum states in simple orthonormal bases 
quantum Kolmogorov complexity is sub-additive, just like 
classical Kolmogorov complexity: 

Lemma 3: For directly computable |a;), \ y) both of which 
belong to (possibly different) orthonormal bases of Kol- 
mogorov complexity 0(1) we have 

K(\x),\y))<K(\x)\\y))+K(\y)) 

up to an additive constant term. 

Proof: By Theorem ^ we there is a program p y to 
compute \y) with l{p) — K{\y)) and, by a similar argu- 
ment as used in the proof of Theorem 0, a program p y ^ x 
to compute \x) from \y) with l(p y ^ x ) = K(\x) \ \y)) up to 
additive constants. Use p y to construct two copies of \y) 
and p y ^ x to construct \x) from one of the copies of \y). The 
separation between these concatenated binary programs is 
taken care of by the self-delimiting property of the sub- 
programs. An additional constant term takes care of the 
couple of 0(l)-bit programs that are required. ■ 

In the classical case we have equality in the lemma (up to 
an additive logarithmic term) . The proof of the remaining 
inequality, as given in the classical case, see |l3[ |, doesn't 
hold for the quantum case. It would require a decision pro- 
cedure that establishes equality between two pure quan- 
tum states without error. It is unknown to the author 
whether some approximate decision rule would give some 
result along the required lines. We additionally note: 

Lemma 4- For all directly computable pure states \x) 
and \y) we have K(\x),\y)) < K{\y)) - log||(x | y)\\ 2 up 
to an additive logarithmic term. 

Proof: K(\x) \ \y)) < — \og\\(x \ y)\\ 2 by the proof of 
Theorem ||. Then, the lemma follows by Lemma ||. ■ 

In contrast, quantum Kolmogorov complexity of arbi- 
trary individual pure quantum states dramatically fails to 
be sub-additive: 

Theorem 8 (Sub-additivity) There are pure quantum 
states \x), \y) of every length n such that K(\x,y)) > 
K(\x))>K(\x)\\y))+K(\y)). 

Proof: Only the second inequality is non-obvious. Let 
|y) = ^(|00 . . . 0) + \x)) and let a; be a maximally complex 



classical n-bit state. Then, — log 1 1 (y \ x) \ \ — 1 . Hence 
the 0(l)-bit program approximating |a;) by observing input 
\y), and outputting the resulting outcome, demonstrates 

K(\x) | \y)) = 0. Furthermore, \y) is approximated by 

|00...0) with — log 1 1 (00 ... | y)\\ 2 = 1. Thus, K{\y)) < 
logn + 2 log log n (the log-term is due to the specification 
of the length of 1 00 ... 0) , and the log log term is due to the 
requirement of self-delimiting coding) . The lemma follows 

since K(\x)) > n. ■ 

Note that the witness states in the proof have K(\x) \ 

\y)) + K(\y)) < logn. If we add the length n in the qubit 
state in the conditional, then the upper bound reduces to 

= 0, while the lefthand-side in the lemma stays > n. In the 
light of Theorem (with n substituted in the conditional) 
this result indicates that state \y) in the proof, although 
obviously directly computable, is not directly computable 
as an element from an orthonormal basis of low complexity. 
Every orthonormal basis B, of which \y) is a basis element, 

has complexity K{B\n) > n — K(\y)\n) = n. 

The "no-cloning" or "approximate cloning" theorems in 
@, @, @, @, @, @ essentially show the following: 
Perfect cloning is only possible if we measure according to 
an orthonormal basis of which one of the basis elements is 
the pure quantum state to be measured. Then, the mea- 
sured pure quantum state can be reproduced at will. Ap- 
proximate cloning considers how to optimize measurements 
so that for a random pure quantum state (possibly from a 
restricted set) the reproduced clone has on average opti- 
mal fidility with the original. Here we see that while the 
complexity K(\y)\n) of the original state \y) in the proof 

above is = 0, the complexity of an orthonormal bases of 
which it is a basis element can be (and usually is in view 
of the incompressibility theorems) at least n for uniform 
at random chosen states \x) — or every other complexity 
in between and n by choice of \x). This gives a rigor- 
ous quantification of the quantum cloning fact that if we 
have full information to reproduce the basis of which the 
unknown individual pure quantum state \y) is a basis el- 
ement, then the quantum Kolmogorov complexity of that 
element is about zero — that is, we can reproduce it at will. 

It is easy to see that for the general case of pure states, an 
alternative demonstration of why the sub-additivity prop- 
erty fails, can be given by way of the "non-cloning" prop- 
erty of Theorem ^. 

Lemma 5: There are infinitely many m and n such that 
there are pure n-qubit states |a;) for which 



K(\x)® m ) > K(\x) 



0m/2 



l/2 ) + K(\xf m/2 ) 



where ">" is meant in the sense of "^c" 
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log 



Proof: With N = 2™ we have Q 
k 



N -1 
k 



k(n — log k + log e) — - log k + 0(1) 



for n — > oo with fc fixed. Substitution in Theorem ^ shows 
that there exists a state \x) such that (up to logarithmic 
additive terms) K{\x)® h ) > kn and K(\xf k/S ) < \kn. So 
writing (again up to logarithmic additive terms) 



K{\x) 



+ 

< 


K(\x 


®k/2 


\xf k/2 ) - 


VK{\x) 


®k/2^ 


± 


K(\x 


®k/2^ 








+ 
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K(\x 


<»fc/4 


\xf k/4 ) - 


VK{\x) 


®fc/ 4 \ 


+ 


K(\x 










+ 

< 


K(\x 


0k/8 


\x)® k/s ) - 


VK{\x) 




+ 


K(\x 


®fe/8\ 









we obtain kn < ^kn, up to an additive logarithmic term, 

which, with k, n > 0, can only hold for k = n = 0. Hence, 

for large enough k and n, one of the < inequalities in the 
above chain must be false. 



Appendix 

I. Appendix: Classical Kolmogorov Complexity 

It is useful to summarize the relevant parts and defini- 
tions of classical Kolmogorov complexity; see also [20], and 
the textbook jL3|. The Kolmogorov complexity [12] of a 
finite object x is the length of the shortest effective binary 
description of x. Let x,y,z e A/", where Af denotes the 
natural numbers and we identify J\f and {0,1}* according 
to the correspondence 

(0,e), (1,0), (2,1), (3, 00), (4, 01),... 

Here e denotes the empty word " with no letters. The length 
l(x) of x is the number of bits in the binary string x. For 
example, Z(010) = 3 and 1(e) — 0. 

The emphasis is on binary sequences only for conve- 
nience; observations in every finite or countably infinite 
alphabet can be so encoded in a way that is 'theory neu- 
tral'. 

A binary string a; is a proper prefix of a binary string y 
if we can write x — yz for z ^ e. A set {x, y, . . .} C {0, 1}* 
is prefix-free if for every pair of distinct elements in the set 
neither is a proper prefix of the other. A prefix-free set is 
also called a prefix code. Each binary string x = x\X2 ■ ■ ■ x n 
has a special type of prefix code, called a self- delimiting 
code, 

X = \X\X\X1X2 ■ ■ ■ X n ^X n , 



Use the following formula flfLql, p. 10), 

f a \ a , , n i a 1 
log ( J = 61og - + (a - 6) log + - 

v h' h a — h v, 



where ^x n = if x n = 1 and ^x n = 1 otherwise. This 
takes care of all strings of length n > 1 . The empty string 
e is encoded by I = 0. This code is self-delimiting because 
we can determine where the code word x ends by reading 
it from left to right without backing up. Using this code 
we define the standard self-delimiting code for x to be x' = 
l(x)x. It is easy to check that l(x) — 2n + 1 and l(x') = 
n + 2 log n + 1 . 

Let (•, •) be a standard one-one mapping from M x J\f 
to Af, for technical reasons choosen such that l((x,y)) = 
l(y) +0(l(x)). An example is (x,y) = l(x)xy. This can be 
iterated to ((•, •),•). 

Let Ti,T2, . . . be a standard enumeration of all Turing 
machines, and let <j>\, (f>2, ■ ■ ■ be the enumeration of corre- 
sponding functions which are computed by the respective 
Turing machines. That is, Ti computes These functions 
are the partial recursive functions or computable functions. 
The conditional complexity of x given y with respect to a 
Turing machine T is 



Ct{x\v) = min 
pe{o,i}> 



{l(p):T((p,y))=x}. 



b(a - b) 



O(l). 



The unconditional Kolmogorov complexity of x with re- 
spect to T is defined by C(x) = C(x\e). Choose a univer- 
sal Turing machine U that expresses its universality in the 
following manner: 

U{((i,p),y))=T l {(p,y)) 

for all i and (p, y) . 

Theorem 9 (Invariance) There is a universal Turing ma- 
chine U, such that for all machines T, there is a constant 
ct (the length of a self-delimiting description of the index 
of T in the enumeration), such that for all x and y we have: 

Cu(x | y) < C T {x I y) + c T . 
For every pair U, U' of universal Turing machines for 
which the theorem holds, there is a fixed constant cjj,u'^ 
depending only on U and U' , such that for all x, y we have: 

\Cu(x | y) - Cu>(x | y)\ < cu,u>- 

To see this, substitute U' for T in the theorem, and, con- 
versely, substitute U' for U and U for T in the theorem, 
and combine the two resulting inequalities. While the com- 
plexities according to U and U' are not exactly equal, they 
are equal up to a fixed constant for all x and y. Therefore, 
one or the other fixed choice of reference universal machine 
U yields resulting complexities that are in a fixed constant 
enveloppe from each other for all arguments. 

Definition 5: We fix U as our reference universal com- 
puter and define the conditional Kolmogorov complexity of 
x given y by 

C(x\y) = min {l(p) : U((p,y)) = x}. 
pe{o,i}* 

The unconditional Kolmogorov complexity of x is defined 
by C(x) = C(x\e). 

The Kolmogorov complexity C(x) of x is the length of 
the shortest binary program from which x is computed: 
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Though defined in terms of a particular machine model, 
the Kolmogorov complexity is machine-independent up to 
an additive constant and acquires an asymptotically uni- 
versal and absolute character through Church's thesis, from 
the ability of universal machines to simulate one another 
and execute every effective process. The Kolmogorov com- 
plexity of an object can be viewed as an absolute and ob- 
jective quantification of the amount of information in it. 
This leads to a theory of absolute information contents of 
individual objects in contrast to classic information the- 
ory which deals with average information to communicate 
objects produced by a random source Jl3| . 

Incompressibility: Since there is a Turing machine, 
say Xi, that computes the identity function Ti(x\y) = x for 
all y, it follows that C(x\y) < l(x)+c for fixed c < 2 logi+1 
and all x. 

It is easy to see that there are also strings that can be de- 
scribed by programs much shorter than themselves. For in- 
stance, the function defined by /(l) — 2 and f(i) — 2^ l ~ 1 " 1 
for i > 1 grows very fast, f(k) is a "stack" of k twos. Yet 
for every k it is clear that f(k) has complexity at most 
= C(k). What about incompressibility? For every n there 
are 2™ binary strings of length n, but only Y^i=o 2 l = 2™ — 1 
descriptions in binary string format of length less than n. 
Therefore, there is at least one binary string x of length n 
such that C(x) > n. We call such strings incompressible. 
The same argument holds for conditional complexity: since 
for every length n there are at most 2™ — 1 binary programs 
of length < n, for every binary string y there is a binary 
string x of length n such that C(x\y) > n. "Randomness 
deficiency" measures how far the object falls short of the 
maximum possible Kolmogorov complexity. For every con- 
stant S we say a string x is has randomness deficiency at 
most 6 if C(x) > l{x) — 5. Strings that are incompress- 
ible (say, with small randomness deficiency) are pattern- 
less, since a pattern could be used to reduce the description 
length. Intuitively, we think of such patternless sequences 
as being random, and we use "random sequence" synony- 
mously with "incompressible sequence." (It is possible to 
give a rigorous formalization of the intuitive notion of a ran- 
dom sequence as a sequence that passes all effective tests 
for randomness, see for example [T^].) 

Since there are few short programs, there can be only few 
objects of low complexity: the number of strings of length 
n, that have randomness deficiency at most 8, is at least 
2 n — 2™~' 5 + 1. Hence there is at least one string of length 
n with randomness deficiency 0, at least one-half of all 
strings of length n have randomness deficiency 1, at least 
three-fourths of all strings of length n have randomness 
deficiency 2, and at least the (1 — l/2 5 )th part of all 2™ 
strings of length n have randomness deficiency at most 5. 

Lemma 6: Let 5 be a positive integer. For every fixed y, 
every set S of cardinality m has at least m(l — 2~ s ) + 1 
elements x with C(x\y) > [logmj — 5. 

Proof: There are N = YT^o 2 * = 2 " ~ 1 binar y 
strings of length less than n. A fortiori there are at most N 
elements of S that can be computed by binary programs of 



length less than n, given y. This implies that at least m—N 
elements of S cannot be computed by binary programs of 
length less than n, given y. Substituting n by [logmj — S 
together with Definition || yields the lemma. ■ 
If we are given S as an explicit table then we can simply 
enumerate its elements (in, say, lexicographical order) us- 
ing a fixed program not depending on S or y. Such a fixed 
program can be given in 0(1) bits. Hence we can upper 

bound the complexity as C(x\S, y) < log \S\. 

Incompressibility Method: One reason to formulate 
a notion of quantum Kolmogorov complexity, apart from its 
interpretation as the information in an individual quantum 
state, is the following. We hope to dupplicate the success 
of the classical version as a proof method, the incompress- 
ibility method, in the theory of computation and combi- 
natorics |ll| : In a typical proof using the incompressibility 
method, one first chooses an incompressible object from the 
class under discussion. The argument invariably says that 
if a desired property does not hold, then in contrast with 
the assumption, the object can be compressed. This yields 
the required contradiction. Since most objects are almost 
incompressible, the desired property usually also holds for 
almost all objects, and hence on average. The hope is that 
one can use the quantum Kolmogorov complexity to show, 
for example, lower bounds on the complexity of quantum 
computations. 

Prefix Kolmogorov complexity: For technical rea- 
sons we also need a variant of complexity, so-called pre- 
fix complexity, which associated with Turing machines for 
which the set of programs resulting in a halting compu- 
tation is prefix-free. We can realize this by equipping 
the Turing machine with a read-only input tape which 
is read from left-to-right without backing up, a separate 
read/write work tape, an auxiliary read-only input tape, 
and a write-only output tape that is written from left- 
to-right without backing up. All tapes are one-way in- 
finite. Such Turing machines are called prefix machines 
since the set of halting programs for such a machine forms 
a prefix- free set. Taking the universal prefix machine U 
we can define the prefix complexity analogously with the 
plain Kolmogorov complexity. Let x* be the shortest pro- 
gram for x that is enumerated first in a fixed general enu- 
meration process (say, by dovetailing the running of all 
candidate programs) of all programs for which the refer- 
ence universal prefix machine computes x. Then, the set 
{x* : U(x*) — x,x £ {0,1}*} is a prefix code. That is, if 
x* and y* are code words for x and y, respectively, with 
x 7^ y, then x* is not a prefix of y*. 

Let (•) be a standard invertible effective one-one encod- 
ing from M x M to prefix- free recursive subset of N . For ex- 
ample, we can set (x, y) = x'y' . We insist on prefix-freeness 
and recursiveness because we want a universal Turing ma- 
chine to be able to read an image under (•) from left to right 
and determine where it ends. Let P\,P2, ■ ■ . be a standard 
enumeration of all prefix machines, and let <pi , 02, ■ ■ ■ be the 
enumeration of corresponding functions that are computed: 
Pi computes <j>i . It is easy to see that (up to the prefix- free 
encoding) these functions are exactly the partial recursive 
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functions or computable functions. The conditional com- 
plexity of x given y with respect to a prefix machine P 
is 

K p(x\y) = min {Kp) : p ((p,y)) = x }- 

pe{o,i}* 

The unconditional complexity of x with respect to P is de- 
fined by K(x) = K(x\e). Choose a universal prefix machine 
U P that expresses its universality in the following manner: 

UP(((i,p),y))=P i ((p,y)) 

for all i and p, y. Proving the Invariance Theorem for prefix 
machines goes by the same reasoning as before. Then, we 
can define: 

Definition 6: Fix a UP as above as our reference univer- 
sal prefix computer, and define the conditional prefix com- 
plexity of x given y by 

K(x\y) = min {J(p) : UP((p, y)) = x}. 

P6{0,1}* 

The unconditional Kolmogorov complexity of x is defined 
by K(x) = K(x\e). 

Note that K(x\y) can be slightly larger than C(x\y), but 
for all x, y we have 

C{x\y) < K{x\y) < C(x\y) +2logC(x\y). 

For example, the incompressibility laws hold also for K(x) 
but in slightly different form. The nice thing about K{x) is 
that we can interpret 2~ K ^ as a probability distribution 
since K{x) is the length of a shortest prefix- free program for 
x. By the fundamental Kraft's inequality, see for example 
@ > @ i we know that if Zi, I2, ■ ■ ■ are the code- word lengths 
of a prefix code, then J2 x 2~' x < f. This leads to the 
notion of the "universal distribution" m(x) = 2" K ^ that 
assigns high probability to simple objects (that is, with low 
prefix complexity) and low probability to complex objects 
(that is, with high prefix complexity) — a rigorous form of 
Occam's Razor. 

If. Appendix: Quantum Turing Machines 

We base quantum Kolmogorov complexity on quantum 
Turing machines. The simplest way to explain the idea 
quantum computation is perhaps by way of probabilistic 
(randomized) computation. This we explain here. Then, 
the definition of the quantum (prefix) Turing machine is 
given in the main text in Sectionsect. model. 

A. Notation 

For every N the finite-dimensional Hilbert space Hn has 
a canonical basis |eo), . . . , |e;y-i). Assume that the canon- 
ical basis of 7ijv is also the beginning of the canonical basis 
of TLm+i- The m-fold tensor product ®"^L{Yi of a Hilbert 
space H is denoted by H® m . 

A pure quantum state <fi represented as a unit length 
vector in such a Hilbert space is denoted as \<f>) and the 
corresponding element of the dual space (the conjugate 
transpose) is written as <ft or (</>|. The inner product of 



{4>\ and \tp) is written in physics notation as (4> \ ip) and in 
mathematics notation as tfyijj. The "bra-ket" notation is 
due to P. Dirac and is the standard quantum mechanics no- 
tation. The "bra" (x| denotes a row vector with complex 
entries, and "ket" |x) is the column vector consisting of 
the conjugate transpose of (x| (columns interchanged with 
rows and the imaginary part of the entries negated, that 
is, is replaced by — \/— 1). 

Of special importance is the two-dimensional Hilbert 
space C 2 , where C is the set of complex real numbers, and 
|0), |1) is its canonical orthonormal basis. An element of 
C 2 is called a qubit (quantum bit in analogy with an el- 
ement of {0,1} which is called a bit for "binary digit"). 
To generalize this to strings of n qubits, we consider the 
quantum state space C N with TV = 2™. The basis vec- 
tors eo, ■ ■ ■ , ejv-i of this space are parametrized by binary 
strings of length n, so that eo is shorthand for eo...o and 
ejv-i is shorthand for e%,,,±. Mathematically, C N is decom- 
posed into a tensor product of n copies of C 2 , written as 
(C 2 )®", and an n-qubit state \a\ . . . a n ) in bra-ket notation 
can also be written as the tensor product \a\) ® . . .®\a n ), 
or shorthand as \a\) . . . \a n ), a string of n qubits, the qubits 
being distinguished by position. 

B. Probabilistic Computation 

Consider the well known probabilistic Turing machine 
which is just like an ordinary Turing machine, except that 
at each step the machine can make a probabilistic move 
which consists in flipping a (say fair) coin and depending 
on the outcome changing its state to either one of two alter- 
natives. This means that at each such probabilistic move 
the computation of the machine splits into two distinct fur- 
ther computations each with probability i. Ignoring the 
deterministic computation steps, a computation involving 
to coinflips can be viewed as a binary computation tree of 
depth m with 2 m leaves, where the set of nodes at level 
t < 777 correspond to the possible states of the system after 
t coinflips, every state occurring with probability 1/2*. For 
convenience, we can label the edges connecting a state x 
directly with a state y with the probability that a state x 
changes into state y in a single coin flip (in this example 
all edges are labeled '^'). 

For instance, given an arbitrary Boolean formula con- 
taining 771 variables, a probabilistic machine can flip its coin 
m times to generate each of the 2 m possible truth assign- 
ments at the 777-level nodes, and subsequently check in each 
node deterministically wether the local assignment makes 
the formula true. If there are k distinct such assignments 
then the probabilistic machine finds that the formula is sat- 
isfiable with probability at least k/2 m — since there are k 
distinct computation paths leading to a satisfiable assign- 
ment. 

Now suppose the probabilistic machine is hidden in a 
black box and the computation proceeds without us know- 
ing the outcomes of the coin flips. Suppose that after 
771 coin flips we open part of the black box and observe 
the bit which denotes the truth assignment for variable X5 
(5 < 771). Before we opened the black box all 2 m initial 
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truth assignments to variables x\, . . . , x m were still equally 
possible, each with probability l/2 m . After we observed 
the state of variable X5, say 0, the probability space of 
possibilities has collapsed to the truth assignments which 
consist of all binary vectors with a in the 5th position 
each of which has probability renormalized to l/2 m_1 . 

C. Quantum Computation 

A quantum Turing machine can be viewed as a general- 
ization of the probabilistic Turing machine. Consider the 
same computation tree. In the probabilistic computation 
there is a probability pi > associated with each node i 
(state of the system) at the same level in the tree, such that 
^2 Pi — 1, summed over the nodes at the same level. In a 
quantum mechanical computation there is a "probability 
amplitude" a>i associated with each basis state \i) of the 
system. Ignore for the moment the quantum equivalent of 
the probabilistic coin flip to produce the computation tree. 
Consider the simple case (corresponding to the probabilis- 
tic example of the states of the nodes at the mth level of 
the computation tree) where i runs through the classical 
values through 2 m — 1, in the quantum case represented 
by the orthonormal basis m-qubit states |00 . . . 0) through 
1 11 . . . 1). Then, the nodes at level m are in a superposi- 
tion \ip) — J2ie{o i} m w ith the probability amplitudes 

satisfying E l£ {o,i} m H a <l| 2 = L 

The amplitudes are complex numbers satisfying 
X)ll a il| 2 = 1j where if a, = a + b\f~^\ then 1 1 ck» j | = 
\Ja 2 + b 2 , and the summation is taken over all distinct 
states of the observable at a particular instant. We say 
"distinct" states since the quantum mechanical calculus 
dictates that equal states are grouped together: If state 
\<f>) of probability amplitude a equals state \ip) of probabil- 
ity amplitude (3, then their combined contribution in the 
sum is 1 1 a + /3|| 2 |(/>). The transitions are governed by a ma- 
trix U which represents the program being executed. Such 
a program has to satisfy the following constraints. Denote 
the set of possible configurations of the Turing machine by 
X, where X is the set of m-bits column vectors (the ba- 
sis states) for simplicity. Then U maps the column vector 
a = (a x ) xe x to Ua. Here a is a (2 m )-element complex 
vector of amplitudes of the quantum superposition of the 
2 m basis states before the step, and Ua the same after 
the step concerned. The special property which U needs 
to satisfy in quantum mechanics is that it is unitary, that 
is, U'U = I where / is the identity matrix and U' is the 
conjugate transpose of U (as with the bra-ket, "conjugate" 
means that all v / — T's are replaced by — \/^T's and 'trans- 
pose' means that the rows and columns are interchanged). 
In other words, U is unitary iff W = U^ 1 . 

The unitary constraint on the evolution of the computa- 
tion enforces two facts. 

1. If U°a = a and U* = UU^ 1 then J2 xe x Wi^ahf = 1 
for all t (discretizing time for convenience). 

2. A quantum computation is reversible (replace U by 
W = U^ 1 ). In particular this means that a computation 
U l a — a t is undone by running the computation stepwise 



in reverse: U^a t — a . 

The quantum version of a single bit is a superposition of 
the two basis states a classical bit: 

\i>)=a\0)+(3\l), 



where ||a|| + ||/3|| = 1. Such a state is called a quan- 
tum bit or qubit. It consists of partially the basis state |0) 
and partially the basis state |1). The states are denoted by 
the column vectors of the appropriate complex probability 
amplitudes. For the basis states the vector notations are: 
|0) = (I) (that is, a = 1 and = 0), and |1) = (°) (that is, 
a = and (3=1). We also write |</>) as the column vector 

Physically, for example, the state \if>) can be the state of 
a polarized photon, and the basis states are horizontal or 
vertical polarization, respectively. Upon measuring accord- 
ing to the basis states, that is, passing the photon through 
a medium that is polarized either in the horizontal or ver- 
tical orientation, the photon is observed with probability 
||a|| or probability \ \(3\\ , respectively. Consider a sample 
computation on a one-bit computer executing the unitary 
operator: 



V2 



(8) 



It is easy to verify, using common matrix calculation, that 

m - i|o>-i 1 ii), S |i> = i|o) + -L l i), 



5 2 |0) 



0|0)-1|1)=-|1), S 2 |l) = l|0)+0|1) = |0). 



If we observe the computer in state 5|0), then the probabil- 
ity of observing state |0) is (^) 2 = \i an d the probability 
to observe |1) is (— -^75) 2 = \- However, if we observe the 
computer in state S 2 \0), then the probability of observing 
state |0) is 0, and the probability to observe |1) is 1. Sim- 
ilarly, if we observe the computer in state S\l), then the 
probability of observing state |0) is (-^|) 2 = \, and the 

probability to observe |1) is (-^) 2 = |. If we observe the 

computer in state ^S* 2 1 1) , then the probability of observ- 
ing state |0) is 1, and the probability to observe |1) is 0. 
Therefore, the operator S inverts a bit when it is applied 
twice in a row, and hence has acquired the charming name 
square root of 'not'. In contrast, with the analogous prob- 
abilistic calculation, nipping a coin two times in a row, we 
would have found that the probability of each computation 
path in the complete binary computation tree of depth 2 
was \, and the states at the four leaves of the tree were 
|0), |1), |0), |1), resulting in a total probability of observing 
|0) being |, and the total probability of observing |1) being 
i as well. 

The quantum principle involved in the above example is 
called interference, similar to the related light phenomenon 
in the seminal "two slit experiment:" If we put a screen 
with a single small enough hole in between a light source 
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and a target, then we observe a gradually dimming illumi- 
nation of the target screen, the brightest spot being colin- 
ear with the light source and the hole. If we put a screen 
with two small holes in between, then we observe a diffrac- 
tion pattern of bright and dark stripes due to interference. 
Namely, the light hits every point on the screen via two 
different routes (through the two different holes). If the 
two routes differ by an even number of half wave lengths, 
then the wave amplitudes at the target are added, resulting 
in twice the amplitude and a bright spot, and if they dif- 
fer by an odd number of half wave lengths then the wave 
amplitudes are in opposite phase and are subtracted re- 
sulting in zero and a dark spot. Similarly, with quantum 
computation, if the quantum state is = a\x) + (3\y), 
then for x = y we have a probability of observing \x) of 
||a + /3|| , rather than | j or } j + \\(3\\ which it would have 
been in a probabilistic fashion. For example, if a = 

and /3 = — then the probability of observing \x) is 

rather than \ , and with the sign of (3 inverted we observe 
| a;) with probability 1. 

D. Quantum Algorithmics 

A quantum algorithm corresponds to a unitary transfor- 
mation U that is built up from elementary unitary transfor- 
mations, every one of which only acts on one or two qubits. 
The algorithm applies U to an initial classical state con- 
taining the input and then makes a final measurement to 
extract the output from the final quantum state. The algo- 
rithm is "efficient" if the number of elementary operations 
is "small" , which usually means at most polynomial in the 
length of the input. Quantum computers can do everything 
a classical computer can do probabilistically — and more. 

We are now in the position to explain the quantum equiv- 



alent of a probabilistic coin flip as promised in Section B-C 



This is a main trick enhancing the power of quantum com- 
putation. A sequence of n fair coin flips "corresponds" 
to a sequence H n of n one-qubit unitary operations, the 
Hadamard transform, 



H = 



1 

71 



i i 
i -i 



on the successive bits of a register of n bits originally in the 
all-0 state = 1 00 . . . 0). The result is a superposition 

H n \iP) = £ 2 ~" /2 l*> 

x<£{0,1}" 

of all the 2™ possible states of the register, each with am- 
plitude 2 - ™/ 2 (and hence probability of being observed of 

The Hadamard transform is ubiquitous in quantum com- 
puting; its singlefold action is similar to that of the trans- 
form 5* of (|8]) with the the roles of "0" and "1" partly in- 
terchanged. In contrast to iS* 2 that implements the logical 
"not," we have H 2 — I with I the identity matrix. 

Subsequent to application of H n , the computation pro- 
ceeds in parallel along the exponentially many computation 



paths in quantum coherent superposition. A sequence of 
tricky further unitary operations, for example the "quan- 
tum Fourier transform," and observations serves to exploit 
interference (and so-called entanglement) phenomena to ef- 
fect a high probability of eventually observing outcomes 
that allow us to determine the desired result, and suppress- 
ing the undesired spurious outcomes. 

One principle that is used in many quantum algorithms 
is as follows. If A is a classical algorithm for computing 
some function /, possibly even irreversible like f{x) = x 
(mod 2), then we can turn it into a unitary transformation 
which maps classical state \x, 0) to \x, f(x)). Note that we 
can apply A to a superposition of all 2" inputs: 

a(V"/ 2 ]>>,0)) =2-/ 2 5>,/(.x)). 

\ X / X 

In some sense this state contains the results of computing 
/ for all possible inputs x, but we have only applied A 
once to obtain it. This effect together with the interference 
phenomenon is responsible for one of the advantages of 
quantum over classical randomized computing and is called 
quantum parallelism. 

This leaves the question of how the input to a computa- 
tion is provided and how the output is obtained. Generally, 
we restrict ourselves to the case where the quantum com- 
puter has a classical input. If the input x has k bits, and 
the number of qubits used by the computation is n > k 
(input plus work space), then we pad the input with non- 
significant 0's and start the quantum computation in an 
initial state (which must be in C N ) \xQ . . .0). When the 
computation finishes the resulting state is a unit vector in 
C , say a.i\i) where i runs through {0, 1}™ and the prob- 
ability amplitudes a^'s satisfy J^i ll a i|| 2 = 1- The output 
is obtained by performing a measurement with as possible 
outcomes the basis vectors. The observed output is proba- 
bilistic: we observe basis vector \i), that is, the rt-bit string 
i, with probability ||ai|| 2 . 
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